Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve boot issues in hybrid azure images during upgrades from RHEL 7 > 8 > 9. #1284

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dkubek
Copy link
Member

@dkubek dkubek commented Aug 20, 2024

This PR includes two changes

Restructure hybrid image detection

Previosly detection of Azure hybrid image was tightly coupled with process of converting grubenv symlink to a regular file. Since there exists other issues relating to hybrid images it is worth to separate these two concepts.

This commit modifies the CheckHybridImage actor so that it produces a message when

  1. the system is booted in Legacy mode and WALinuxAgent is detected
  2. the system is booted in Legacy mode, ESP (EFI System Partition) is detected and mounted and system is running on Hyper-V hypervisor

New CheckGrubenvToFile actor is responsible for detection of grubenv symlink on hybrid images and tasks ConvertGrubenvToFile to be later responsible for the actual conversion during the upgrade.

Check potential boot failures in Azure Gen1 VMs due to invalid grubcfg

This change addresses the issue where the /boot/grub2/grub.cfg file is overwritten during the upgrade process by an old RHEL7 configuration leftover on the system, causing the system to fail to boot.

The problem occurs on hybrid Azure images. The issue is caused by one of the scriplets in grub-efi which overwrites during the upgrade current configuration in /boot/grub2/grub.cfg by an old configuration from /boot/efi/EFI/redhat/grub.cfg.

The issue is detected and fixed specifically on Azure hybrid cloud systems.

If old configuration is detected, this actor regenerates the grub configuration using grub2-mkconfig -o /boot/grub2/grub.cfg after installing rpms to ensure the correct boot configuration is in place.

Old configuration is detected by looking for a menuentry corresponding to a kernel from RHEL 7 which should not be present on RHEL 8 systems.

JIRA: RHEL-38255

Copy link

Thank you for contributing to the Leapp project!

Please note that every PR needs to comply with the Leapp Guidelines and must pass all tests in order to be mergeable.
If you want to request a review or rebuild a package in copr, you can use following commands as a comment:

  • review please @oamg/developers to notify leapp developers of the review request
  • /packit copr-build to submit a public copr build using packit

Packit will automatically schedule regression tests for this PR's build and latest upstream leapp build.
However, here are additional useful commands for packit:

  • /packit test to re-run manually the default tests
  • /packit retest-failed to re-run failed tests manually
  • /packit test oamg/leapp#42 to run tests with leapp builds for the leapp PR#42 (default is latest upstream - master - build)

Note that first time contributors cannot run tests automatically - they need to be started by a reviewer.

It is possible to schedule specific on-demand tests as well. Currently 2 test sets are supported, beaker-minimal and kernel-rt, both can be used to be run on all upgrade paths or just a couple of specific ones.
To launch on-demand tests with packit:

  • /packit test --labels kernel-rt to schedule kernel-rt tests set for all upgrade paths
  • /packit test --labels beaker-minimal-8.10to9.4,kernel-rt-8.10to9.4 to schedule kernel-rt and beaker-minimal test sets for 8.10->9.4 upgrade path

See other labels for particular jobs defined in the .packit.yaml file.

Please open ticket in case you experience technical problem with the CI. (RH internal only)

Note: In case there are problems with tests not being triggered automatically on new PR/commit or pending for a long time, please contact leapp-infra.

@pirat89 pirat89 added the bug Something isn't working label Aug 20, 2024
This commit addresses the issue where the `/boot/grub2/grub.cfg` file is
overwritten during the upgrade process by an old RHEL7 configuration
leftover on the system, causing the system to fail to boot.

The problem occurs on hybrid Azure images, which support both UEFI and
Legacy systems and have both `grub-pc` and `grub-efi` packages installed.
It is caused by one of the scriplets in `grub-efi` which overwrites the old
configuration.

If old configuration is detected, this actor regenerates the grub
configuration using `grub2-mkconfig -o /boot/grub2/grub.cfg` after
installing rpms to ensure the correct boot configuration is in place.

The fix is applied specifically to Azure hybrid cloud systems.

JIRA: RHEL-38255
@dkubek dkubek force-pushed the azure_hybrid_grubcfg_fix branch from 390e3e4 to cc3fa6f Compare August 20, 2024 11:50
@dkubek dkubek marked this pull request as draft August 20, 2024 12:33
@dkubek dkubek force-pushed the azure_hybrid_grubcfg_fix branch 3 times, most recently from 0e3bf53 to 838a95a Compare August 27, 2024 11:22
@dkubek dkubek force-pushed the azure_hybrid_grubcfg_fix branch 2 times, most recently from 14a0302 to 807c270 Compare August 27, 2024 11:56
@dkubek dkubek marked this pull request as ready for review August 28, 2024 11:07
Copy link
Member

@MichalHe MichalHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks cool to me, just minor details that require changing.

Previosly detection of Azure hybrid image was tightly coupled with
process of converting grubenv symlink to a regular file. Since there
exists other issues relating to hybrid images it is worth to separate
these two concepts.

This commit modifies the ScanHybridImage actor so that it produces a
message whel WALinuxAgent is detected or we are booted in bios and ESP
partition is mounted and we are running on Hyper-V (sign of a hybrid
image).

New CheckGrubenvToFile actor is responsible for detection of grubenv
symlink on hybrid images and tasks ConvertGrubenvToFile that is later
responsible for the actual conversion.
@dkubek dkubek force-pushed the azure_hybrid_grubcfg_fix branch from 807c270 to 39d83e3 Compare September 10, 2024 14:47
@pirat89 pirat89 added this to the 8.10/9.6 milestone Jan 9, 2025
Copy link
Member

@pirat89 pirat89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - untested. keeping opened yet until someone test it manually :)

@pirat89 pirat89 removed this from the 8.10/9.6 milestone Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants